The HAM10000 Dataset: A Large Collection of Multi-Source Dermatoscopic Images of Common Pigmented Skin Lesions

نویسندگان

  • Philipp Tschandl
  • Cliff Rosendahl
  • Harald Kittler
چکیده

Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available datasets of dermatoscopic images. We tackle this problem by releasing the HAM10000 (“Human Against Machine with 10000 training images”) dataset. We collected dermatoscopic images from different populations acquired and stored by different modalities. Given this diversity we had to apply different acquisition and cleaning methods and developed semiautomatic workflows utilizing specifically trained neural networks. The final dataset consists of 11788 dermatoscopic images, of which 10010 will be released as a training set for academic machine learning purposes and will be publicly available through the ISIC archive. This benchmark dataset can be used for machine learning and for comparisons with human experts. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions. More than 50% of lesions have been confirmed by pathology, the ground truth for the rest of the cases was either follow-up, expert consensus, or confirmation by in-vivo confocal microscopy. Background & Summary Dermatoscopy is a widely used diagnostic technique that improves the diagnosis of benign and malignant pigmented skin lesions in comparison to examination with the unaided eye [1]. Dermatoscopic images are also a suitable source to train artificial neural networks to diagnose pigmented skin lesions automatically. In 1994, Binder et al. [2] already used dermatoscopic images successfully to train an artificial neural network to differentiate melanomas, the deadliest type of skin cancer, from melanocytic nevi. Although the results were promising the study, like most earlier studies, suffered from a small sample size and the ar X iv :1 80 3. 10 41 7v 1 [ cs .C V ] 2 8 M ar 2 01 8 lack of dermatoscopic images other than melanoma or nevi. Recent advances in graphics card capabilities and machine learning techniques set new benchmarks with regard to the complexity of neural networks and raised expectations that automated diagnostic systems will soon be available that diagnose all kinds of pigmented skin lesions without the need of human expertise [3]. Training of neural-network based diagnostic algorithms requires a large number of annotated images [4] but the number of high quality dermatoscopic images with reliable diagnoses is limited or restricted to only a few classes of diseases. In 2013 Mendonça et al. made 200 dermatoscopic images available as the PH2 dataset including 160 nevi and 40 melanomas [5]. Pathology was the ground truth for melanomas but not available for most nevi. Because the set is publicly available1 and includes comprehensive metadata it served as a benchmark dataset for studies of the computer diagnosis of melanoma until now. Accompanying the book Interactive Atlas of Dermoscopy [6] a CD-ROM is commercially available with digital versions of 1044 dermatoscopic images including 167 images of non-melanocytic lesions, and 20 images of diagnoses not covered in the HAM10000 dataset. Although this is one of the most diverse available datasets in regard to covered diagnoses, its use is probably limited because of its constrained accessibility. The ISIC archive2 is a collection of multiple databases and currently includes 13786 dermatoscopic images3. Because of permissive licensing (CC-0), well structured availability, and large size it is currently the standard source for dermatoscopic image analysis research. It is, however, biased towards melanocytic lesions (12893 of 13786 images are nevi or melanomas). Because this portal is the most comprehensive, technically advanced, and accessible resource, we will provide our dataset through the ISIC archive. Because of the limitations of available datasets, past research focused on melanocytic lesions (i.e the differentiation between melanoma and nevus) and disregarded non-melanocytic pigmented lesions although they are common in practice. The mismatch between the small diversity of available training data and the variety of real life data resulted in a moderate performance of automated diagnostic systems in the clinical setting despite excellent performance in experimental settings [3, 5, 7, 8]. Building a classifier for multiple diseases is more challenging than binary classification [9]. Currently, reliable multi-class predictions are only available for clinical images of skin diseases but not for dermatoscopic images [10, 11]. To boost the research on automated diagnosis of dermatoscopic images we released the HAM10000 (“Human Against Machine with 10000 training images”) dataset. The dataset will be provided to the participants of the ISIC 2018 classification challenge hosted by the annual MICCAI conference in Granada, 1http://www.fc.up.pt/addi/ 2https://isic-archive.com/ 3Number as of February 12, 2018 Image format standardisation

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accuracy of the first step of the dermatoscopic 2-step algorithm for pigmented skin lesions

OBJECTIVES To evaluate the frequency of misclassifications of equivocal pigmented lesions according to the first step of the dermatoscopic 2-step algorithm. PATIENTS AND METHODS 707 consecutive cases from 553 patients of central Europe and Australia were included in the study. Dermatoscopic images were evaluated in a blinded fashion for the presence of features described in the 2-step algorit...

متن کامل

A Study of Dermatoscopic Features in Facial Melanosis and Its Clinical Co-relation – an Observation Study

The Facial melanosis is a common presentation in Indian patients, causing cosmetic disfigurement with considerable psychological impact. Diagnosis is generally based on history & clinical features. There is considerable overlap in features amongst various clinical entities. More or less well-defined entities can be easily recognized, however many transitional forms defy classification. An enorm...

متن کامل

Non-melanoma skin cancer diagnosis with a convolutional neural network

Background: The most common types of non-melanoma skin cancer are basal cell carcinoma (BCC), and squamous cell carcinoma (SCC). AKIEC -Actinic keratoses (Solar keratoses) and intraepithelial carcinoma (Bowen’s disease)- are common non-invasive precursors of SCC, which may progress to invasive SCC, if left untreated. Due to the importance of early detection in cancer treatment, this study aimed...

متن کامل

Pigmented Skin Lesion Biopsies After Computer-Aided Multispectral Digital Skin Lesion Analysis.

BACKGROUND The incidence of melanoma has been rising over the past century. With 37% of patients presenting to their primary care physician with at least 1 skin problem, primary care physicians and other nondermatologist practitioners have substantial opportunity to make an impact at the forefront of the disease process. New diagnostic aids have been developed to augment physician analysis of s...

متن کامل

Classification of the Pigmented Skin lesions in Dermoscopic Images by Shape Features Extraction

ifferentiation of benign and malignant (melanoma) of the pigmented skin lesions is difficult even for the dermatologists thus in this paper a new analysis of the dermatoscopic images have been proposed. Segmentation, feature extraction and classification are the major steps of images analysis. In Segmentation step we use an improved FFCM based segmentation method (our previous work) to achieve ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018